{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "regional-affair",
   "metadata": {},
   "source": [
    "# Content Type Decoding\n",
    "\n",
    "MLServer extends the V2 inference protocol by adding support for a `content_type` annotation.\n",
    "This annotation can be provided either through the model metadata `parameters`, or through the input `parameters`.\n",
    "By leveraging the `content_type` annotation, we can provide the necessary information to MLServer so that it can _decode_ the input payload from the \"wire\" V2 protocol to something meaningful to the model / user (e.g. a NumPy array).\n",
    "\n",
    "This example will walk you through some examples which illustrate how this works, and how it can be extended."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "suburban-damage",
   "metadata": {},
   "source": [
    "## Echo Inference Runtime\n",
    "\n",
    "To start with, we will write a _dummy_ runtime which just prints the input, the _decoded_ input and returns it.\n",
    "This will serve as a testbed to showcase how the `content_type` support works.\n",
    "\n",
    "Later on, we will extend this runtime by adding custom _codecs_ that will decode our V2 payload to custom types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "alleged-tunnel",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting runtime.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile runtime.py\n",
    "import json\n",
    "\n",
    "from mlserver import MLModel\n",
    "from mlserver.types import InferenceRequest, InferenceResponse, ResponseOutput\n",
    "from mlserver.codecs import DecodedParameterName\n",
    "\n",
    "_to_exclude = {\n",
    "    \"parameters\": {DecodedParameterName},\n",
    "    'inputs': {\"__all__\": {\"parameters\": {DecodedParameterName}}}\n",
    "}\n",
    "\n",
    "class EchoRuntime(MLModel):\n",
    "    async def predict(self, payload: InferenceRequest) -> InferenceResponse:\n",
    "        outputs = []\n",
    "        for request_input in payload.inputs:\n",
    "            decoded_input = self.decode(request_input)\n",
    "            print(f\"------ Encoded Input ({request_input.name}) ------\")\n",
    "            as_dict = request_input.dict(exclude=_to_exclude)  # type: ignore\n",
    "            print(json.dumps(as_dict, indent=2))\n",
    "            print(f\"------ Decoded input ({request_input.name}) ------\")\n",
    "            print(decoded_input)\n",
    "            \n",
    "            outputs.append(\n",
    "                ResponseOutput(\n",
    "                    name=request_input.name,\n",
    "                    datatype=request_input.datatype,\n",
    "                    shape=request_input.shape,\n",
    "                    data=request_input.data\n",
    "                )\n",
    "            )\n",
    "        \n",
    "        return InferenceResponse(model_name=self.name, outputs=outputs)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "impaired-principle",
   "metadata": {},
   "source": [
    "As you can see above, this runtime will decode the incoming payloads by calling the `self.decode()` helper method.\n",
    "This method will check what's the right content type for each input in the following order:\n",
    "\n",
    "1. Is there any content type defined in the `inputs[].parameters.content_type` field within the **request payload**?\n",
    "2. Is there any content type defined in the `inputs[].parameters.content_type` field within the **model metadata**?\n",
    "3. Is there any default content type that should be assumed?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "chronic-halloween",
   "metadata": {},
   "source": [
    "### Model Settings\n",
    "\n",
    "In order to enable this runtime, we will also create a `model-settings.json` file.\n",
    "This file should be present (or accessible from) in the folder where we run `mlserver start .`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "accessory-jerusalem",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting model-settings.json\n"
     ]
    }
   ],
   "source": [
    "%%writefile model-settings.json\n",
    "\n",
    "{\n",
    "    \"name\": \"content-type-example\",\n",
    "    \"implementation\": \"runtime.EchoRuntime\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "canadian-somalia",
   "metadata": {},
   "source": [
    "## Request Inputs\n",
    "\n",
    "Our initial step will be to decide the content type based on the incoming `inputs[].parameters` field.\n",
    "For this, we will start our MLServer in the background (e.g. running `mlserver start .`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "criminal-personality",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "payload = {\n",
    "    \"inputs\": [\n",
    "        {\n",
    "            \"name\": \"parameters-np\",\n",
    "            \"datatype\": \"INT32\",\n",
    "            \"shape\": [2, 2],\n",
    "            \"data\": [1, 2, 3, 4],\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"np\"\n",
    "            }\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"parameters-str\",\n",
    "            \"datatype\": \"BYTES\",\n",
    "            \"shape\": [11],\n",
    "            \"data\": \"hello world 😁\",\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"str\"\n",
    "            }\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "response = requests.post(\n",
    "    \"http://localhost:8080/v2/models/content-type-example/infer\",\n",
    "    json=payload\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "false-blind",
   "metadata": {},
   "source": [
    "### Model Metadata\n",
    "\n",
    "Our next step will be to define the expected content type through the model metadata.\n",
    "This can be done by extending the `model-settings.json` file, and adding a section on inputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "enclosed-russia",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting model-settings.json\n"
     ]
    }
   ],
   "source": [
    "%%writefile model-settings.json\n",
    "\n",
    "{\n",
    "    \"name\": \"content-type-example\",\n",
    "    \"implementation\": \"runtime.EchoRuntime\",\n",
    "    \"inputs\": [\n",
    "        {\n",
    "            \"name\": \"metadata-np\",\n",
    "            \"datatype\": \"INT32\",\n",
    "            \"shape\": [2, 2],\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"np\"\n",
    "            }\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"metadata-str\",\n",
    "            \"datatype\": \"BYTES\",\n",
    "            \"shape\": [11],\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"str\"\n",
    "            }\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "linear-logic",
   "metadata": {},
   "source": [
    "After adding this metadata, we will re-start MLServer (e.g. `mlserver start .`) and we will send a new request without any explicit `parameters`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "independent-yacht",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "payload = {\n",
    "    \"inputs\": [\n",
    "        {\n",
    "            \"name\": \"metadata-np\",\n",
    "            \"datatype\": \"INT32\",\n",
    "            \"shape\": [2, 2],\n",
    "            \"data\": [1, 2, 3, 4],\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"metadata-str\",\n",
    "            \"datatype\": \"BYTES\",\n",
    "            \"shape\": [11],\n",
    "            \"data\": \"hello world 😁\",\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "response = requests.post(\n",
    "    \"http://localhost:8080/v2/models/content-type-example/infer\",\n",
    "    json=payload\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "waiting-samuel",
   "metadata": {},
   "source": [
    "As you should be able to see in the server logs, MLServer will cross-reference the input names against the model metadata to find the right content type."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "documentary-butterfly",
   "metadata": {},
   "source": [
    "### Custom Codecs\n",
    "\n",
    "There may be cases where a custom inference runtime may need to encode / decode to custom datatypes.\n",
    "As an example, we can think of computer vision models which may only operate with `pillow` image objects.\n",
    "\n",
    "In these scenarios, it's possible to extend the `Codec` interface to write our custom encoding logic.\n",
    "A `Codec`, is simply an object which defines a `decode()` and `encode()` methods.\n",
    "To illustrate how this would work, we will extend our custom runtime to add a custom `PillowCodec`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "simple-reducing",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting runtime.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile runtime.py\n",
    "import io\n",
    "import json\n",
    "\n",
    "from PIL import Image\n",
    "\n",
    "from mlserver import MLModel\n",
    "from mlserver.types import (\n",
    "    InferenceRequest,\n",
    "    InferenceResponse, \n",
    "    RequestInput, \n",
    "    ResponseOutput\n",
    ")\n",
    "from mlserver.codecs import NumpyCodec, register_input_codec, DecodedParameterName\n",
    "\n",
    "\n",
    "_to_exclude = {\n",
    "    \"parameters\": {DecodedParameterName},\n",
    "    'inputs': {\"__all__\": {\"parameters\": {DecodedParameterName}}}\n",
    "}\n",
    "\n",
    "@register_input_codec\n",
    "class PillowCodec(NumpyCodec):\n",
    "    ContentType = \"img\"\n",
    "    DefaultMode = \"L\"\n",
    "    \n",
    "    def encode(self, name: str, payload: Image) -> ResponseOutput:\n",
    "        byte_array = io.BytesIO()\n",
    "        payload.save(byte_array, mode=self.DefaultMode)\n",
    "        \n",
    "        return ResponseOutput(\n",
    "            name=name,\n",
    "            shape=payload.size,\n",
    "            datatype=\"BYTES\",\n",
    "            data=byte_array.getvalue()\n",
    "        )\n",
    "    \n",
    "    def decode(self, request_input: RequestInput) -> Image:\n",
    "        if request_input.datatype != \"BYTES\":\n",
    "            # If not bytes, assume it's an array\n",
    "            image_array = super().decode(request_input)\n",
    "            return Image.fromarray(image_array, mode=self.DefaultMode)\n",
    "        \n",
    "        encoded = request_input.data.__root__\n",
    "        if isinstance(encoded, str):\n",
    "            encoded = encoded.encode()\n",
    "\n",
    "        return Image.frombytes(\n",
    "            mode=self.DefaultMode,\n",
    "            size=request_input.shape,\n",
    "            data=encoded\n",
    "        )\n",
    "\n",
    "class EchoRuntime(MLModel):\n",
    "    async def predict(self, payload: InferenceRequest) -> InferenceResponse:\n",
    "        outputs = []\n",
    "        for request_input in payload.inputs:\n",
    "            decoded_input = self.decode(request_input)\n",
    "            print(f\"------ Encoded Input ({request_input.name}) ------\")\n",
    "            as_dict = request_input.dict(exclude=_to_exclude)  # type: ignore\n",
    "            print(json.dumps(as_dict, indent=2))\n",
    "            print(f\"------ Decoded input ({request_input.name}) ------\")\n",
    "            print(decoded_input)\n",
    "            \n",
    "            outputs.append(\n",
    "                ResponseOutput(\n",
    "                    name=request_input.name,\n",
    "                    datatype=request_input.datatype,\n",
    "                    shape=request_input.shape,\n",
    "                    data=request_input.data\n",
    "                )\n",
    "            )\n",
    "        \n",
    "        return InferenceResponse(model_name=self.name, outputs=outputs)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "communist-childhood",
   "metadata": {},
   "source": [
    "We should now be able to restart our instance of MLServer (i.e. with the `mlserver start .` command), to send a few test requests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "underlying-judgment",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "payload = {\n",
    "    \"inputs\": [\n",
    "        {\n",
    "            \"name\": \"image-int32\",\n",
    "            \"datatype\": \"INT32\",\n",
    "            \"shape\": [8, 8],\n",
    "            \"data\": [\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0,\n",
    "                1, 0, 1, 0, 1, 0, 1, 0\n",
    "            ],\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"img\"\n",
    "            }\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"image-bytes\",\n",
    "            \"datatype\": \"BYTES\",\n",
    "            \"shape\": [8, 8],\n",
    "            \"data\": (\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "                \"10101010\"\n",
    "            ),\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"img\"\n",
    "            }\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "response = requests.post(\n",
    "    \"http://localhost:8080/v2/models/content-type-example/infer\",\n",
    "    json=payload\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "creative-roads",
   "metadata": {},
   "source": [
    "As you should be able to see in the MLServer logs, the server is now able to decode the payload into a Pillow image.\n",
    "This example also illustrates how `Codec` objects can be compatible with multiple `datatype` values (e.g. tensor and `BYTES` in this case)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "flexible-series",
   "metadata": {},
   "source": [
    "## Request Codecs\n",
    "\n",
    "So far, we've seen how you can specify codecs so that they get applied at the input level.\n",
    "However, it is also possible to use request-wide codecs that aggregate multiple inputs to decode the payload.\n",
    "This is usually relevant for cases where the models expect a multi-column input type, like a Pandas DataFrame.\n",
    "\n",
    "To illustrate this, we will first tweak our `EchoRuntime` so that it prints the decoded contents at the request level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "identical-somerset",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting runtime.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile runtime.py\n",
    "import json\n",
    "\n",
    "from mlserver import MLModel\n",
    "from mlserver.types import InferenceRequest, InferenceResponse, ResponseOutput\n",
    "from mlserver.codecs import DecodedParameterName\n",
    "\n",
    "_to_exclude = {\n",
    "    \"parameters\": {DecodedParameterName},\n",
    "    'inputs': {\"__all__\": {\"parameters\": {DecodedParameterName}}}\n",
    "}\n",
    "\n",
    "class EchoRuntime(MLModel):\n",
    "    async def predict(self, payload: InferenceRequest) -> InferenceResponse:\n",
    "        print(\"------ Encoded Input (request) ------\")\n",
    "        as_dict = payload.dict(exclude=_to_exclude)  # type: ignore\n",
    "        print(json.dumps(as_dict, indent=2))\n",
    "        print(\"------ Decoded input (request) ------\")\n",
    "        decoded_request = None\n",
    "        if payload.parameters:\n",
    "            decoded_request = getattr(payload.parameters, DecodedParameterName)\n",
    "        print(decoded_request)\n",
    "            \n",
    "        outputs = []\n",
    "        for request_input in payload.inputs:\n",
    "            outputs.append(\n",
    "                ResponseOutput(\n",
    "                    name=request_input.name,\n",
    "                    datatype=request_input.datatype,\n",
    "                    shape=request_input.shape,\n",
    "                    data=request_input.data\n",
    "                )\n",
    "            )\n",
    "        \n",
    "        return InferenceResponse(model_name=self.name, outputs=outputs)\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "labeled-onion",
   "metadata": {},
   "source": [
    "We should now be able to restart our instance of MLServer (i.e. with the `mlserver start .` command), to send a few test requests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "satellite-texas",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "payload = {\n",
    "    \"inputs\": [\n",
    "        {\n",
    "            \"name\": \"parameters-np\",\n",
    "            \"datatype\": \"INT32\",\n",
    "            \"shape\": [2, 2],\n",
    "            \"data\": [1, 2, 3, 4],\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"np\"\n",
    "            }\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"parameters-str\",\n",
    "            \"datatype\": \"BYTES\",\n",
    "            \"shape\": [2, 11],\n",
    "            \"data\": [\"hello world 😁\", \"bye bye 😁\"],\n",
    "            \"parameters\": {\n",
    "                \"content_type\": \"str\"\n",
    "            }\n",
    "        }\n",
    "    ],\n",
    "    \"parameters\": {\n",
    "        \"content_type\": \"pd\"\n",
    "    }\n",
    "}\n",
    "\n",
    "response = requests.post(\n",
    "    \"http://localhost:8080/v2/models/content-type-example/infer\",\n",
    "    json=payload\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "figured-member",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
